<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Anscombe's Quartet of 'Identical' Simple Linear Regressions</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link rel="stylesheet" type="text/css" href="R.css" />
</head><body>

<table width="100%" summary="page for anscombe"><tr><td>anscombe</td><td style="text-align: right;">R Documentation</td></tr></table>

<h2>Anscombe's Quartet of &lsquo;Identical&rsquo; Simple Linear Regressions</h2>

<h3>Description</h3>

<p>Four <i>x</i>-<i>y</i> datasets which have the same traditional
statistical properties (mean, variance, correlation, regression line,
etc.), yet are quite different.
</p>


<h3>Usage</h3>

<pre>anscombe</pre>


<h3>Format</h3>

<p>A data frame with 11 observations on 8 variables.
</p>

<table summary="Rd table">
<tr>
 <td style="text-align: right;">
    x1 == x2 == x3 </td><td style="text-align: left;"> the integers 4:14, specially arranged </td>
</tr>
<tr>
 <td style="text-align: right;">
    x4             </td><td style="text-align: left;"> values 8 and 19 </td>
</tr>
<tr>
 <td style="text-align: right;">
    y1, y2, y3, y4 </td><td style="text-align: left;"> numbers in (3, 12.5) with mean 7.5 and sdev 2.03</td>
</tr>

</table>



<h3>Source</h3>

<p>Tufte, Edward R. (1989).
<em>The Visual Display of Quantitative Information</em>, 13&ndash;14.
Graphics Press.
</p>


<h3>References</h3>

<p>Anscombe, Francis J. (1973).
Graphs in statistical analysis.
<em>The American Statistician</em>, <b>27</b>, 17&ndash;21.
doi: <a href="http://doi.org/10.2307/2682899">10.2307/2682899</a>.
</p>


<h3>Examples</h3>

<pre>
require(stats); require(graphics)
summary(anscombe)

##-- now some "magic" to do the 4 regressions in a loop:
ff &lt;- y ~ x
mods &lt;- setNames(as.list(1:4), paste0("lm", 1:4))
for(i in 1:4) {
  ff[2:3] &lt;- lapply(paste0(c("y","x"), i), as.name)
  ## or   ff[[2]] &lt;- as.name(paste0("y", i))
  ##      ff[[3]] &lt;- as.name(paste0("x", i))
  mods[[i]] &lt;- lmi &lt;- lm(ff, data = anscombe)
  print(anova(lmi))
}

## See how close they are (numerically!)
sapply(mods, coef)
lapply(mods, function(fm) coef(summary(fm)))

## Now, do what you should have done in the first place: PLOTS
op &lt;- par(mfrow = c(2, 2), mar = 0.1+c(4,4,1,1), oma =  c(0, 0, 2, 0))
for(i in 1:4) {
  ff[2:3] &lt;- lapply(paste0(c("y","x"), i), as.name)
  plot(ff, data = anscombe, col = "red", pch = 21, bg = "orange", cex = 1.2,
       xlim = c(3, 19), ylim = c(3, 13))
  abline(mods[[i]], col = "blue")
}
mtext("Anscombe's 4 Regression data sets", outer = TRUE, cex = 1.5)
par(op)
</pre>


</body></html>
